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ABSTRACT 

The  Digital  Optical  Computing  Program  within  the  National  Science  Foundation 
Engineering  Research  Center  for  Optoelectronic  Computing  Systems  has  as  its  specific 
goal  research  on  optical  computing  architectures  suitable  for  use  at  the  highest  possible 
speeds.  The  program  can  be  targeted  toward  exploiting  the  time  domain  because  other 
programs  in  the  Center  are  pursuing  research  on  parallel  optical  systems,  exploiting 
optical  interconnection  and  optical  devices  and  materials.  Using  a  general  purpose 
computing  architecture  as  the  focus,  we  are  developing  design  techniques,  tools  and 
architectures  for  operation  at  the  speed  of  light  limit.  Experimental  work  is  being 
done  with  the  somewhat  low  speed  components  currently  available  but  with  architec¬ 
tures  which  will  scale  up  in  speed  as  faster  devices  are  developed.  The  design  algo¬ 
rithms  and  tools  developed  for  a  general  purpose,  stored  program  computer  are  being 
applied  to  other  systems  such  as  optically  controlled  optical  communications  networks. 


t Research  was  supported  in  part  by  the  National  Aeronautics  and  Space  Administration  under  NASA  Contract  No.  NAS1- 
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Center  which  is  funded  in  part  by  the  National  Science  Foundation  under  the  Engineering  Research  Centers  program  grant  No. 
CDR  8622236  and  by  the  Colorado  Advanced  Technology  Institute  (CATI),  an  agency  of  the  State  of  Colorado. 


INTRODUCTION 

The  Digital  Optical  Computing  Program  within  the  Optoelectronic  Computing  Sys¬ 
tems  Center  at  the  University  of  Colorado  at  Boulder  is  centered  around  the  design  and 
construction  of  an  "all-optical,"  general  purpose,  stored  program,  digital  computer.  It  is 
"all-optical"  in  the  sense  that  logic  level  components  have  only  optical  inputs  and  out¬ 
puts,  with  all  inter-component  signals  restricted  to  light  It  is  digital  because  this  type  of 
operation  has  proven  successful  in  representing  both  arbitrary  precision  numbers  and 
control  information.  The  computer  science  term,  stored  program,  means  that  instructions 
are  stored  as  data  to  be  manipulated  by  the  computer  itself.  Thus  it  can  "write  its  own 
programs"  by,  for  example,  running  compilers.  Finally,  "general  purpose"  implies  an 
instruction  set  which  supports  both  the  symbolic  processing  needed  to  manipulate  pro¬ 
grams  as  well  as  numeric  computation.  The  design  is  bit  serial  to  minimize  the  number 
of  active  optical  devices.  Fiber  delay  lines  are  used  for  storage  because  they  are  passive 
elements,  suited  for  storing  serial  information.  Waveguide  switches  using  the  electro¬ 
optic  effect  are  used  to  do  logic.  The  bit  serial  design  uses  bandwidth,  or  time  domain 
capacity,  to  achieve  processing  power.  Since  terabits  per  second  are  possible  in  one  opti¬ 
cal  channel,  much  complexity  can  be  put  into  the  time  domain,  making  possible  proto¬ 
types  with  few  components.  To  minimize  active  elements,  we  have  adopted  a  simple  but 
complete  instruction  set  without  floating  point  arithmetic.  Instructions  have  one  address 
with  no  complex  addressing  modes.  A  carefully  optimized  design  gives  a  complete  com¬ 
puter  using  only  a  few  tens  of  switches.  Optical  fibers  form  all  memory  and  intercon¬ 
necting  components.  There  are  no  synchronizing  elements  such  as  flip  flops,  so  all  signal 
storage  is  in  passive  fiber  delays.  More  important  than  demonstrating  an  optical  com¬ 
puter  is  gaining  more  understanding  of  the  use  of  the  time  domain  in  computer  architec¬ 
ture  and  of  time-space  trade  offs.  Another  goal  is  to  transfer  digital  electronics 
knowledge  to  optics.  There  may  be  new  ways  to  use  optics  which  have  no  analogs  in 
electronics,  but  it  would  be  unwise  to  assume  either  a  complete  break  with  the  extensive 
knowledge  base  in  digital  computing  or  to  assume  that  it  all  transfers  unchanged  to 
optics. 

Prior  work  in  optics  which  applies  most  directly  to  the  current  work  is  in  communi¬ 
cations  and  signal  processing.  Single  and  multi-mode  optical  fiber  and  connector  sys¬ 
tems  have  been  developed  and  commercialized[l].  Static  directional  couplers  are  avail¬ 
able  with  specified  power  splitting  ratios  and  can  be  used  for  fan-out  or  for  combining 
noninterfering  signals.  Electrically  switched  directional  couplers[2]  have  reached  a  rea¬ 
sonable  degree  of  maturity,  and  are  available  from  more  than  one  source.  They  are  used 
for  modulation,  multiplexing  and  demultiplexing^]  of  optical  communications  signals. 
To  get  a  component  with  all  inputs  and  outputs  optical,  we  add  an  a  photodetector, 
amplifier  and  electrode  driver  to  allow  the  switch  to  be  optically  controlled.  The  above 
devices  are  used  as  shown  in  Fig.  1  to  provide  an  implementation  domain  for  digital  opti¬ 
cal  computing.  It  uses  intensity  encoding  of  bits  and  synchronous  operation.  A  light 
pulse  at  a  clock  time  is  a  logic  one,  and  no  light  represents  a  zero.  The  waveguide  switch 
computes  the  multiplexer  function  shown.  Interconnection  is  done  with  single  mode 
fiber  and  fanout  by  3dB  fiber  couplers,  which  are  also  used  to  merge  signals  from  two 
sources  when  only  one  source  carries  a  signal  at  a  time.  Memory  is  accomplished 
entirely  by  the  propagation  time  of  optical  pulses  in  fiber.  The  delay  schematic  shown  in 
the  figure  represents  a  coil  of  fiber  of  delay  K  A. 
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Figure  1:  Fiber  and  Waveguide  Switch  Implementation  Domain. 


For  our  purposes,  an  electro-optically  switched  directional  coupler  can  be  viewed  as 
a  controlled  exchange  element  with  two  optical  waveguide  inputs,  the  signals  on  which 
can  be  copied  directly  or  exchanged  onto  two  optical  waveguide  outputs.  The  direct, 
"bar,"  state  or  exchange,  "cross,"  state  are  under  the  control  of  an  electrical  potential.  Its 
physical  properties  can  be  summarized  at  the  systems  level  by  loss  and  crosstalk  from 
inputs  to  outputs  in  both  states.  Loss  can  be  kept  under  5dB  and  crosstalk  less  than  -20 
dB.  Optical  fiber  has  an  index  of  refraction  of  about  1.5,  which  implies  a  distance-time 
correspondence  of  20  cm/nanosecond.  At  a  wavelength  of  1300  nanometers,  losses  of  a 
few  tenths  of  a  dB/km  are  achievable  with  low  chromatic  dispersion.  Standard  connector 
technology  yields  1/2  dB  or  less  loss  per  connection.  See,  for  example,  Cherin[4]. 

The  photodetector  and  electronics  encapsulated  in  the  logic  component  limit  its 
speed,  so  the  impact  of  this  limit  on  our  work  must  be  assessed.  Our  emphasis  is  on 
architecture,  so  the  question  is  whether  useful  concepts  and  techniques  can  be  developed 
in  spite  of  the  limitation.  Logic  is  done  entirely  by  the  waveguide  switches,  and  any 
other  logically  complete  optical  component  can  be  used  with  little  impact  on  the  system. 
As  far  as  interconnections  and  delays  are  concerned,  the  clock  frequency  can  be 
increased  simply  by  scaling  down  all  fiber  lengths.  This  is  the  essence  of  the  concept  of 
a  speed  scalable  architecture.  At  the  architectural  level,  all  physical  time  constants  other 
than  the  speed  of  light  are  encapsulated  in  the  switching  component  The  speed  of  a  sys¬ 
tem  with  a  speed  scalable  architecture  can  be  increased  by  replacing  the  logic  element 
with  one  m  times  as  fast  and  scaling  down  all  fiber  lengths  by  a  factor  of  m .  The  archi¬ 
tecture  remains  unchanged. 
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To  understand  the  potential  value  of  speed  scalable  architectures,  one  can  extrapo¬ 
late  system  speeds  for  devices  which  are  sdll  in  the  research  stage.  The  time  domain 
capacity  of  optical  data  transmission  is  important  because  transmission  is  becoming  more 
of  a  limit  to  computer  speed  than  switching.  It  is  physically  possible  to  produce  and  pro¬ 
pagate  10  femtosecond  optical  pulses,  which  translates  to  a  bandwidth  of  100 
terabits/second.  Haner[5]  has  actually  demonstrated  100  femtosecond  resolution  in  a 
time  compressed  waveform,  which  promises  that  10  terabits/second  may  actually  be 
achievable.  A  fast,  logically  complete  optical  switch  has  been  demonstrated  by  Islam[6] 
\ ,  who  built,  NOR  gates  using  300  femtosecond  solitons.  His  gates  show  that  optical 
switching  ahd  transmission  may  attain  similar  speeds.  A  smaller,  but  significant,  speed 
improvement  is  expected  from  integrated  electro-optic  switches,  waveguides,  detectors 
and  electronics  in  a  III-V  materials  system[7]. 

'  *  Using  such  bandwidths  requires  a  speed  scalable  architecture.  The  architectural 
drivers  implied  by  speed  scalability  are: 

all  inter-component  signals  are  restricted  to  light; 

there  are  no  synchronizing  memory  elements; 

synchronization  is  done  by  controlling  optical  delays; 

optical  signal  quality  must  be  restored,  both  in  amplitude  and  timing; 

any  logically  complete  device  can  substitute  for  the  switches. 

Although  a  general  purpose  digital  computer  focuses  the  work,  it  is  not  expected  to  be 
the  first  competitive  success  of  optical  architectures.  A  near  term  application  is  in  opti¬ 
cally  switched  optical  communication  networks.  Packet  switching  requires  only  a  small 
amount  of  logic.  Even  simple  optical  state  machines  can  improve  high  speed  communi¬ 
cations  systems,  which  now  require  conversion  between  optics  and  electronics  for  the 
simplest  switching.  High  speed  controllers  in  hostile  environments,  such  as  particle 
accelerators,  are  also  a  potential  application.  In  general  purpose  computing,  optics  will 
complement  electronics  long  before  it  replaces  it.  Time  (or  frequency)  multiplexing  can 
make  high  speed  serial  systems  effective  adjuncts  to  slower,  parallel,  electronic  ones. 


BUILDING  BLOCKS 

A  digital  computing  architecture  must  include  logic,  interconnection,  signal  restora¬ 
tion  and  memory.  The  ability  to  restore  signals  in  both  amplitude  and  timing  is  not 
strictly  a  logical  function,  since  it  depends  on  the  physical  characteristics  of  signals.  It 
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Figure  2:  Signal  Restoration  in  Amplitude  and  Timing. 
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can  be  accomplished  with  a  switch  component  by  gating  the  system  clock  as  shown  in 
Fig.  2.  Amplitude  is  restored  because  the  incoming  optical  pulse  is  physically  switched 
to  the  output.  For  timing  restoration  to  be  effective,  the  control  signal  must  arrive  earlier 
than  the  clock  and  then  be  amplified  and  broadened  in  order  to  switch  the  full,  correctly 
timed  clock  pulse  to  the  output,  while  the  second  output  receives  a  restored  comple¬ 
mented  copy  of  the  control  signal.  Clock  gating  was  used  in  electronic  computers  to 
restore  timing.  Here  it  also  restores  the  optical  power  level.  This  makes  supplying  opti¬ 
cal  power  a  problem  of  producing  multiple  copies  of  a  synchronized  clock. 

The  multiplexer  function,  D  =AC  4-  BC,  shown  for  the  switch  of  Fig.  1,  is  logically 
complete  given  the  constants  zero  and  one.  In  the  pulse  coded  representation,  zero  is  the 
absence  of  light,  and  one  is  a  copy  of  the  system  clock.  A  circuit  with  both  logic  and 
memory  is  the  serial  binary  adder  which  will  add  two  binary  numbers  presented  to  its 
inputs  low  order  bit  first.  It  consists  of  a  single  full  adder  and  a  one  clock  period  delay  to 
store  the  cany  used  in  computing  the  next  higher  order  sum  bit.  Figure  3  shows  the  cir¬ 
cuit  built  from  waveguide  switches,  3  dB  couplers  for  fanout,  and  a  fiber  delay  for  the 
carry.  The  switches  connected  to  the  inputs  not  only  complement  them  but  also  do  signal 
restoration.  Switches  S3  and  S4  demonstrate  AND,  OR  and  exclusive  OR  functions. 


Figure  3:  Serial  Binary  Adder  with  Carry  Delay 
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The  circuit  is  independent  of  the  length  N  of  the  binary  numbers,  but  end  conditions, 
such  as  initializing  the  carry  delay  to  a  zero  or  discarding  the  high  order  carry,  require 
more  switches  and  timing  signals  to  marie  word  boundaries. 

Memory  registers  extend  the  carry  delay  of  the  binary  adder  with  signal  regenera¬ 
tion,  read  and  write  access  to  the  register.  Figure  4  shows  a  K  bit  register.  Switch  SI 
regenerates  data  on  every  circulation  through  the  loop,  or  once  every  K  clock  periods. 
With  switch  S2  in  the  cross  state,  a  one  bit  emerging  from  the  K  bit  storage  loop  causes 
switch  SI  to  copy  a  clock  pulse  into  the  loop.  Zeros  route  the  unconnected  input  B  of  SI 
to  the  D  output.  The  3  dB  coupler  allows  the  register  to  be  read  and  switch  S2  provides 
the  ability  to  write  new  information  by  holding  the  Write  input  at  logic  one  for  K  bit 
periods.  When  such  a  delay  loop  is  used  as  a  register  in  a  serial  machine,  its  length  is 
usually  equal  to  the  computer  word  length, and  its  contents  can  be  read  or  written  once 
per  word  time. 

Multiple  word  memories  use  the  same  kind  of  storage  loop  shown  in  Fig.  4.  With 
K  bits  per  word,  the  length  an  N  word  memory  loop  is  NK  bits.  A  scale  of  N  counter 
incremented  once  per  word  determines  which  word  is  currently  passing  switch  S2,  where 
it  can  be  read  or  written.  This  counter  requires  m  =  Rogi^l  bits.  If  m  <  K,  the  m  bit 
counter  can  be  incremented  during  the  passage  of  one  word.  To  access  a  word,  its 
address  is  compared  to  the  counter  value  at  each  word  time  until  a  match  occurs.  A  large 
memory  can  have  several  loops  and  an  address  of  two  parts,  one  of  which  selects  a  loop, 
while  the  other  is  compared  to  a  counter.  The  number  and  size  of  loops  is  determined  by 
the  acceptable  waiting  time  for  an  address  match  and  by  the  physical  limits  on  the  loop 
capacity.  Sarrazin[8]  has  examined  the  several  physical  limitations  on  memory  loop 
capacity  to  establish  a  resolution  of  one  part  in  104  per  degree  C  for  a  synchronous 
storage  loop. 

The  serial  counter  design  will  be  referred  to  several  times.  Figure  5  is  a  block 
diagram  of  a  four  bit,  scale  of  16  serial  counter.  On  the  left  is  a  four  bit  increment  signal 
consisting  of  a  one  in  the  low  order  bit  position  and  three  zeros.  Below  the  half  adder  is 
a  stored  four  bit  count  value,  with  low  order  bit  at  the  left  Above  the  half  adder  is  a 
carry  bit.  It  and  the  count  are  stored  in  delay  lines  of  one  and  four  bit  durations,  respec¬ 
tively.  Use  of  an  m  bit  delay  line  and  placing  the  increment  signal  at  the  low  order  bit  of 
an  m  bit  period  are  the  only  changes  required  to  make  it  an  m  bit  scale  of  2m  counter. 
Figure  6(a)  shows  a  logic  description  of  the  counter  and  its  optical  design  is  shown  in 


Figure  5:  Block  Diagram  of  a  Bit  Serial  Counter. 


6 


Figure  6:  Implementation  of  the  Bit  Serial  Counter. 

Fig.  6(b).  The  signal  labeled  Clk  is  an  optical  oscillator  with  pulses  appearing  once 
every  bit  time.  The  signal  Wck  times  a  word  of  the  binary  count  by  producing  a  pulse 
every  four  bit  times.  Including  two  switches  required  to  derive  Wck  from  Clk,  the  com¬ 
plete  design  requires  only  five  switches,  making  it  a  simple  implementation  target 

The  design  of  Fig.  6(b)  neglects  a  fundamental  problem  of  speed  scalable  design. 
Delays  in  the  circuit  are  taken  to  be  zero  except  for  those  associated  with  explicit  delay 
elements.  Delays  are  actually  distributed  throughout  the  circuit,  in  connections,  switches 
and  electrode  drivers.  The  delay  distribution  problem  is  to  distribute  delays  to  coordinate 
signal  arrival  times  at  the  inputs  to  a  switch.  It  is  a  result  of  interconnection  delay  being 
of  the  same  order  as  logic  delay  and  the  absence  of  flip  flop  synchronization.  Lumped 
delay  designs  using  familiar  digital  design  techniques  must  be  transformed  into  realistic 
ones  with  delays  which  meet  the  physical  requirements  of  components  and  layout.  Oth¬ 
erwise,  signals  to  be  logically  combined  will  not  arrive  simultaneously  at  the  proper  logic 
element 
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Figure  7  shows  a  two  switch  circuit  to  derive  the  word  clock,  Wck,  for  the  counter 
of  Fig.  6  from  the  master  clock,  Clk.  Part  (a)  shows  the  lumped  delay  design,  which  pro¬ 
duces  one  pulse  out  for  every  four  in,  and  part  (b)  shows  the  design  with  a  delay  associ¬ 
ated  with  each  signal  path.  Also  shown  are  two  equations  which  ensure  that  correspond¬ 
ing  inputs  arrive  simultaneously  at  switches  S1  and  S2,  respectively,  and  inequalities 
which  characterize  the  minimum  delays  in  the  paths  between  outputs  and  inputs.  The 
delays,  $4,  8g,  8c,  &d  and  8g  are  associated  with  the  five  terminals  of  a  waveguide 
switch,  while  $5  is  associated  with  the  fixed  3dB  coupler  used  as  a  signal  splitter.  By 
adding  length  to  the  interconnections  and  adjusting  the  lumped  delays,  the  equations  and 
constraints  can  be  satisfied,  provided  no  feedback  loop  has  an  inherent  physical  delay 
longer  than  the  specified  lumped  delay.  Since  the  minimum  lumped  delay  in  a  feedback 
loop  for  a  non-trivial  sequential  circuit  is  usually  one  clock  period,  the  "optical  length"  or 
latency  of  a  switching  element  puts  a  lower  limit  on  the  clock  cycle  time.  The  latency  is 
not  necessarily  related  to  the  bandwidth  of  an  element.  The  topic  of  time  multiplexed 
architectures,  which  make  use  of  high  bandwidth  logic  in  spite  of  long  latency,  will  be 
discussed  later. 


SYSTEM  ARCHITECTURES 

At  this  point  essential  building  blocks  for  a  stored  program  computer  —  logic, 
registers  and  multi-word  memory  —  have  been  discussed.  The  experimental  question  is 
whether  a  complete,  stored  program  computer  can  be  built  with  few  enough  switches  to 
be  feasible  as  a  near  term  prototype.  The  current  cost  and  size  of  available  waveguide 
switches  imply  that  a  computer  requiring  hundreds  to  thousands  of  them  would  remain 
only  a  paper  design  yielding  no  practical  experience  with  speed  scalable  architectures. 
The  SCAMP[9]  architecture  is  a  carefully  minimized  design  containing  all  of  the  features 


(a)  Lumped  Delay  Design 
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(b)  Distributed  Delay  Design 


Figure  7:  The  Delay  Distribution  Problem. 
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of  a  general  purpose  computer  except  input/output.  I/O  will  initially  be  supplied  by  the 
monitor  subsystem[10]  necessary  to  control  and  make  measurements  on  the  computer. 
For  minimality,  general  registers  are  represented  by  a  single  accumulator.  It,  the  pro¬ 
gram  counter,  the  instruction  register  and  the  memory  counter  are  the  only  registers 
accessible  every  word  time. 

Along  with  minimizing  registers,  the  instruction  set  is  also  kept  small.  Multiply, 
divide  and  floating  point  arithmetic  are  left  for  software,  although  preliminary  work  on 
multiply[ll]  and  divide[12]  hardware  is  in  progress.  The  arithmetic  logic  unit  (ALU)  is 
limited  to  and,  or,  not,  add  and  shift.  This  has  sufficed  for  many  microprocessors  and  is 
a  reasonable  first  step.  The  design  proceeded  in  two  phases.  First,  logic,  registers  and 
memory  were  assembled  under  the  assumption  that  the  waveguide  switches  implemented 
perfect  multiplexers.  The  complete  design  required  only  about  50  waveguide  switches. 
The  second  phase  used  measured  loss  and  crosstalk  values  to  determine  where  to  place 
signal  restorers  to  meet  the  physical  specifications  of  the  switches  and  photodetectors. 
This  second  phase  resulted  in  a  design  using  about  75  switches.  Although  the  design 
uses  a  16  bit  word  length,  only  delay  line  lengths  change  to  accommodate  any  word 
length  which  is  no  shorter  than  the  memory  address  length  plus  six  bits.  Simulations 
verified  the  design  at  both  the  logic  level  and  the  physical  level. 

The  soliton  gates [6]  cited  as  a  demonstration  of  very  high  speed  logic  used  20 
meters  of  fiber  to  obtain  sufficient  interaction  length  given  the  weak  nonlinearity  of  glass. 
Such  extreme  ratios  of  reciprocal  latency  to  bandwidth  are  not  expected  in  mature  optical 
devices,  but  interaction  lengths  of  a  few  centimeters  in  terahertz  bandwidth  gates  would 
not  be  surprising  because  of  the  large  power  densities  which  would  otherwise  be 
required.  Although  long  latency  limits  the  minimum  feedback  loop  length,  such  gates 
would  have  the  potential  for  hundred-fold  time  multiplexing  or  pipelining.  Decoupling 
the  duration  of  a  switched  pulse  from  the  latency  of  the  switching  element  by  this  means 
opens  up  important  possibilities  for  optical  logic  devices. 

An  architectural  technique  to  make  use  of  devices  with  high  bandwidth  but  long 
latency  decouples  consecutive  bits  by  time  multiplexing  serial  data  streams.  One  can 
multiplex  several  bit  streams  on  the  hardware  of  the  SCAMP  to  yield  several  indepen¬ 
dent  computers.  Such  a  time  multiplexed  multiprocessor  requires  multiplexing  of  pro¬ 
cessor  inputs  and  demultiplexing  of  outputs,  as  shown  in  Fig.  8.  Since  multiplexing  and 
demultiplexing  do  not  require  feedback,  they  can  be  implemented  with  long  latency 
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Figure  8:  Time  Multiplexed  Multiprocessor 
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devices.  Time  multiplexed  multiprocessors  have  been  built  with  electronics.  An  early 
commercial  one  implemented  the  ten  peripheral  processors  of  the  CDC6600[13],  and  a 
more  recent  pipelined  multiprocessor,  the  Denelcor  HEP[14],  multiplexed  up  to  128 
instruction  streams  on  one  set  of  processor  hardware.  Pipelined  vector  units  in  current 
supercomputers  use  time  multiplexing  of  independent  vector  components  to  achieve  high 
speed.  Latency  tolerance  is  incorporated  at  the  level  of  numeric  operations  in  arithmetic 
pipelines  and  systolic  arrays,  but  only  in  the  highest  speed  designs  is  it  a  gate  level  con¬ 
cern  of  the  sort  addressed  by  the  delay  distribution  problem.  More  research  is  needed  on 
trade-offs  and  optimizations  possible  in  designing  systems  pipelined  at  the  gate  level, 
especially  if  such  designs  use  no  latches.  The  elimination  of  latches  is  not  intrinsically 
desirable,  but  since  latching  implies  a  device  entering  a  stable  state,  and  since  time  con¬ 
stants  associated  with  stable  states  are  long  compared  to  those  of  unstable  states,  the 
highest  speed  designs  may  well  avoid  latches. 

The  immediate  future  of  high  speed  optical  logic  is  probably  in  communications 
rather  than  general  purpose  computing.  Packet  switched  communication  networks,  con¬ 
trolled  by  information  contained  in  the  data  being  transmitted,  have  great  utility.  Since 
information  in  high  speed  fiber  networks  arrives  and  departs  in  optical  form,  optically 
controlled  optical  switching  would  benefit  such  networks.  Since  network  control  logic 
can  be  simple,  and  the  network  is  extended  in  space,  existing  expensive  and  large 
waveguide  switches  are  not  as  severe  a  limitation  as  in  general  purpose  computing.  A 
specific  architecture  for  a  high  speed,  packet  switched,  optical  communications  network 
is  being  studied[15].  It  is  based  on  three  interacting  ideas:  1)  a  network  of  NlogN  nodes 
of  fixed  fan-in  and  fan-out  in  which  a  message  needs  to  pass  through  only  order  log/V 
intermediate  nodes  to  reach  its  destination;  2)  "hot  potato”  routing,  in  which  messages 
are  not  stored  in  intermediate  nodes;  and  3)  optical  compression  of  data  packets  to 
release  bandwidth  for  use  in  network  synchronization  and  control.  Nodes  in  the  network 
not  only  do  switching  but  are  associated  with  hosts  which  originate  and  consume  mes¬ 
sages.  A  ShuffleNet[16]  network  of  N  \0g2N  nodes  with  two  inputs  and  two  outputs  per 
node  and  a  maximum  distance  of  21og2tf  -  1  between  any  two  nodes  will  be  used.  When 
two  incoming  messages  need  to  use  the  same  output  port,  electronic  switching  nodes  typ¬ 
ically  store  one  of  the  conflicting  messages  for  later  transmission.  Rather  than  do 
electro-optic  conversion  for  storage,  the  network  uses  "hot  potato”  or  deflection  touting 
to  send  one  of  the  messages  through  the  wrong  output  port  to  make  its  way  by  another 
path  to  its  destination.  Finally,  conflicts  are  minimized  and  synchronization  is  simplified 
if  message  packets  are  separated  in  time  by  large  gaps.  Time  compression  of  the  optical 
data  by  wavelength  multiplexing  or  grating  techniques  can  create  such  gaps,  thus  trading 
potential  data  bandwidth  for  ease  of  network  control. 

Another  important  architecture  uses  fiber  delays  and  exchange  switches  for  time  slot 
interchange.  This  time  domain  permutation  is  useful  both  in  accessing  information  from 
a  serial  delay  line  in  an  order  different  from  that  in  which  it  is  stored  and  to  allow  the 
time  multiplexed  independent  processors  of  Fig.  8  to  exchange  information.  Time  is 
divided  into  slots  containing  information,  with  a  frame  consisting  of  N  sequential  time 
slots.  Time  slot  interchange  means  moving  information  from  slots  of  an  input  signal  into 
slots  in  different  relative  positions  of  the  output  frame,  such  a  permutation  is  associated 
with  a  frame  delay.  A  time  slot  interchange  architecture  has  immediate  application  in 
time  multiplexed  telecommunications  channels.  Time  multiplexed  signals  are  most  often 
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switched  by  demultiplexing  into  separated  channels,  switching  in  the  space  domain,  and 
re-multiplexing  the  result.  Thompson’s!  17]  architecture  uses  waveguide  switches  to 
demultiplex  an  input  stream  into  individual  time  slots,  uses  fiber  loops  to  individually 
delay  them,  and  uses  more  switches  to  multiplex  them  into  the  output  stream.  Leaving 
out  switches  needed  to  vary  the  delays,  2N  -2  switches  are  used  in  the  multiplexer  and 
demultiplexer. 

Ramanan[18]  applied  techniques  developed  for  multistage  switching  networks  in 
the  space  domain  to  time  domain  permutation.  The  basic  building  block  of  the  architec¬ 
ture  uses  a  switch  connected  to  a  delay  loop  of  size  A  in  a  feedback  configuration  to 
selectively  interchange  pairs  of  time  slots  separated  by  a  fixed  time  A,  a  multiple  of  the 
slot  time.  Figure  9  shows  the  situation  for  a  A  of  one  slot  time.  Any  number  of  pairs  can 
be  interchanged  by  setting  the  control  for  exchange  (x)  for  all  time  slots  except  the 
second  of  a  pair  to  be  exchanged,  for  which  it  is  set  for  straight  connection  (=).  The 
Benes[19]  network,  with  2/Vlog2 N  -  1  exchange  switches,  is  a  universal  space  domain 
switch.  Ramanan’s  time  domain  analog  of  this  network  can  perform  any  time  slot  per¬ 
mutation  on  a  frame  of  N  =  2*  slots  with  only  2\ogjN  -  1  of  the  above  building  blocks. 

One  block  with  delay  loop  of  length  N/2  can  selectively  exchange  any  pair  of  slots 
separated  by  N/2  units.  The  frame  suffers  an  overall  delay  of  N 12  slot  times.  If  we  now 
use  an  N/2  exchange  switch  at  both  input  and  output  of  a  universal  interchanger  for 
frames  of  length  N/2,  as  shown  in  Fig.  10,  we  have  a  recursive  construction  for  a  univer¬ 
sal  interchanger  of  length  N .  The  input  stage  allows  time  slots  to  be  selectively 
exchanged  between  first  and  last  half  frames,  the  center  section  permutes  each  half  frame 
arbitrarily,  and  the  output  stage  again  allows  selective  exchange  of  pairs  between  half 
frames.  This  is  sufficient  to  apply  the  Benes  looping  algorithm[20]  to  show  that  if  the 
center  can  permute  frames  of  N/2  slots,  the  whole  network  can  permute  frames  of  length 
N.  If  N  =  2*  is  a  power  of  two,  continuing  the  recursion  until  a  one  block  exchanger  for 
adjacent  slots  is  left  in  the  center  yields  a  general  time  slot  interchanger  with  21og2^V  -  1 
switches  and  delay  loops,  as  shown  in  Fig.  11.  An  alternative  design  in  which  the  delays 
increase  toward  the  center  as  powers  of  two  is  also  possible  but  more  difficult  to 
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Figure  9:  Exchange  of  Time  Slot  Pairs 
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Figure  10:  Recursive  Construction  of  a  General  Time  Slot  Interchanger. 
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Figure  11:  A  Time  Slot  Interchanger  with  21og2^  —  1  Switches. 

describe.  Thompson’s  design  requires  2N  -  2  switches  for  the  demultiplexer  and  multi¬ 
plexer  alone.  For  permuting  1024  time  slots,  the  new  design  requires  19  switches  com¬ 
pared  with  more  than  2046  for  the  other  architecture.  This  new  architecture  shows  how 
optics  can  give  insight  into  time-space  tradeoffs  which  may  even  have  advantages  for 
electronic  implementation.  Since  time  slot  interchange  forms  a  large  fraction  of  all 
telecommunications  switching,  the  practical  value  of  the  result  may  be  large. 


TOOLS  AND  TECHNIQUES 

Simulation  is  an  important  tool  in  realizing  computer  architectures,  which  by  nature 
involve  high  complexity.  The  SCAMP  design  uses  many  clever  tricks  to  reduce  the 
number  of  switches.  Since  clever  tricks  can  backfire,  a  logic  level  verification  of  the 
design  is  the  first  important  step.  The  tool  built  to  do  this  is  an  event  driven  simulator 
called  HATCH[21J.  As  a  result  of  the  absence  of  flip  flops  in  the  design,  it  is  a  continu¬ 
ous  time  simulator.  Clocked  timing  is  introduced,  as  in  the  actual  system,  by  an  object 
called  a  clock  which  produces  a  standard  repetitive  signal.  The  HATCH  software  is 
object  oriented  so  that  it  can  evolve  to  meet  new  simulation  needs  by  the  addition  of  new 
object  types  and  methods. 

The  first  evolutionary  challenge  met  by  HATCH  was  to  solve  the  delay  distribution 
problem,  described  in  connection  with  Fig.  7.  The  circuit  is  taken  as  a  graph  whose 
nodes  are  waveguide  switches  or  3  dB  couplers.  The  edges  of  the  graph  represent  inter¬ 
connections  between  elements.  A  delay  vector,  with  one  component  for  each  edge, 
characterizes  the  delays  in  the  design.  For  the  lumped  delay  design,  many  of  these  com¬ 
ponents  are  zero.  In  a  real  design,  each  delay  vector  component  is  always  greater  than 
some  minimum  which  represents  the  path  length  through  components,  length  of  couplers, 
length  of  the  interconnecting  fiber,  and  for  some  edges,  the  latency  of  photodetector  and 
electrode  driver  circuitry.  The  physical  constraints  are  thus  embodied  in  a  minimum 
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delay  vector  over  the  edges  of  the  graph.  The  linear  equations  ensuring  synchronized 
signal  arrival  are  derived  from  the  lumped  delay  design.  A  delay  vector  having  each 
component  greater  than  or  equal  to  that  of  the  minimum  vector  and  satisfying  the  linear 
system  is  a  possible  design,  and  that  having  the  least  extra  delay  is  the  solution.  Three 
algorithms  for  solving  this  constrained  minimization  problem  were  studied  and  com¬ 
pared:  the  simplex  method[22],  the  shortest  path  method[23],  and  the  local  distribution 
method[24].  The  study[24]  showed  that  the  local  distribution  method  converged  well, 
with  delays  increasing  monotonically  up  from  the  lumped  delay  values  to  those  of  the 
solution  vector.  The  simplicity  of  this  algorithm  gave  it  better  performance  than  the 
other  two,  so  it  was  therefore  included  in  HATCH  to  do  delay  distribution. 

Signal  quality  management  is  also  included  in  HATCH.  Power  losses  can  make 
ones  appear  to  be  zeros  while  crosstalk  in  the  switches  may  cause  zeros  to  accumulate 
noise  and  appear  to  be  ones.  At  each  switch  control  terminal,  a  threshold  decision  distin¬ 
guishes  zeros  from  ones.  A  signal  restorer  must  be  placed  in  any  optical  path  from  a 
standard  clock  which  has  enough  loss  for  a  logical  one  to  be  below  threshold  or  enough 
crosstalk  for  a  zero  to  be  above  threshold.  If  loss  and  crosstalk  specifications  are  associ¬ 
ated  with  each  device,  HATCH[25]  can  compute  signal  degradation  associated  with  a 
specified  path  or  identify  the  worst  case  path.  A  designer  can  thus  use  it  to  add  restoring 
switches  to  a  design  which  assumes  ideal  elements.  This  was  done  for  the  SCAMP 
design  assuming  a  loss  of  -5  dB,  a  crosstalk  of  less  than  -20  dB  and  a  control  terminal 
photodetector  threshold  of  -19  dBm,  obtaining  a  signal  restored  design  for  SCAMP 
requiring  only  75  switches. 

The  extended  HATCH  is  a  general  tool  to  design  fiber  optic  and  waveguide  switch 
based  systems.  Starting  with  a  lumped  delay  design,  logic  simulation  with  ideal  gates 
verifies  the  sequential  behavior.  Component  delays  then  allow  HATCH  produce  fiber 
lengths  for  a  distributed  delay  design.  When  loss  and  crosstalk  specifications  are  added, 
HATCH  identifies  critical  paths  for  insertion  of  signal  restoring  switches.  The  final 
design  is  then  simulated  with  delay,  loss  and  crosstalk  specifications  to  produce  loga¬ 
rithmically  scaled  plots  of  signal  amplitudes  versus  time  under  worst  case  loss  and 
crosstalk  assumptions.  An  overview  of  the  functionality  of  HATCH  is  illustrated  in  Fig. 
12. 

It  has  been  mentioned  that  techniques  for  gate  level  time  multiplexing  can  help 
overcome  the  effects  of  latency.  A  specific  example  is  the  serial  counter  design.  The 
shortest  feedback  loop  in  the  counter  of  Fig.  6  has  a  length  of  one  bit  time.  Since  it 
passes  through  two  switches  and  a  3  dB  coupler,  it  sets  a  lower  limit  on  the  bit  rate  of  the 
counter.  Time  multiplexing  etui  increase  the  effective  counter  bandwidth  by  multiplex¬ 
ing  more  than  one  independent  bit  stream  on  one  set  of  counter  hardware.  This  gives  the 
effect  of  several  simultaneous  counters,  each  running  at  the  original  bit  rate.  A  block 
diagram  for  this  scheme  with  two  multiplexed  counts  appears  in  Fig.  13.  The  counter 
associated  with  the  bits  in  the  white  boxes  is  about  to  be  incremented  from  3  to  4  while 
that  associated  with  the  stippled  boxes  is  about  to  change  from  8  to  9.  A  carry  feedback 
generated  from  a  bit  at  time  t  need  not  combine  with  a  bit  arriving  at  the  increment  input 
any  sooner  than  two  bit  times  later.  The  two  Wck  input  streams  can  be  multiplexed 
using  only  differential  delay  and  a  3  dB  coupler,  and  the  count  outputs  can  be  demulti¬ 
plexed  by  one  switch  toggling  at  the  effective  bit  rate. 
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Figure  12:  The  HATCH  Design  Support  System. 


Figure  13:  Two  Independent  Counters  Multiplexed  on  the  Same  Hardware 

EXPERIMENTS 

The  demonstration  of  a  prototype  optical  computer  involves  several  intermediate 
experiments.  From  the  architecture  viewpoint,  a  simple  feedback  state  machine  is  the 
first  step.  We  chose  the  one  out  of  four  scaler  of  Fig.  7  driving  the  counter  of  Fig.  6.  The 
count  value  delay  line  demonstrates  one  word  storage,  so  the  second  step  is  the  multi¬ 
word  memory  loop,  which  also  requires  a  binary  counter  and  a  serial  comparator.  The 
memory  and  one  word  registers  will  hold  operands  and  result  for  an  arithmetic  unit, 
which  will  be  the  third  subunit  built.  The  instruction  fetch,  decode  and  execute  cycle 
will  be  implemented  last.  The  current  status  of  experiments  is  between  the  counter  and 
memory  demonstrations. 

The  scaler  of  Fig.  7  is  a  feedback  state  machine,  but  is  simpler  than  the  counter 
because  it  is  self  stabilizing  if  a  bit  is  lost  or  gained.  An  infrequent  bit  error  in  the  scaler 
would  go  undetected  on  an  oscilloscope.  The  counter,  on  the  other  hand,  has  a  period  of 
64  bits,  and  single  bit  errors  have  a  large  influence  on  its  output.  The  optical  scaler  and 
binary  counter  combination  has  been  built  and  tested[26]  yielding  the  output  waveform 
for  a  50  MHz  clock  rate  shown  in  Fig.  14.  The  complemented  count  available  at  the 
unused  output  of  SWS  in  Fig.  6  is  shown,  low  order  bit  first  reading  left  to  right.  Chang¬ 
ing  two  fiber  lengths  yields  a  six  bit,  scale  of  64,  counter,  and  this  device  was  also  built 
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Figure  14:  Output  of  the  Four  Bit  Optical  Counter 

and  operated  at  a  SO  MHz  clock  rate.  A  modified  design  with  a  shortened  carry  feedback 
loop  was  built  and  operated  at  100  MHz.  The  technique  of  using  time  multiplexing  to 
increase  the  effective  counter  bandwidth  was  also  applied  to  obtain  100  MHz  operation 
by  interleaving  two  independent  count  values,  independently  incremented  by  interleaved 
Wck  signals.  The  dual  counter  was  also  operated  successfully  at  an  effective  100  MHz 
rate. 


CONCLUSIONS 

The  work  discussed  here  primarily  exploits  the  time  domain  to  make  potential  use 
of  high  optical  bandwidth,aIthough  the  packet  switched  communications  network  also 
includes  significant  spatial  parallelism.  If  this  work  does  not  directly  address  the  use  of 
spatial  parallelism  in  optics,  it  also  does  not  conflict  with  it.  The  ideas  of  speed  scalable 
architectures  should  ideally  be  combined  with  the  parallel  optical  designs  being  pursued 
effectively  by  other  groups  [27]  [28]  using  synchronous  operation  and  latching  gates. 
The  most  parallel  system  running  at  the  highest  possible  speed  is  the  ideal  optical  com¬ 
puter,  although  the  time  slot  interchanger  shows  that  at  least  some  interesting  systems  are 
strictly  serial. 

This  work  also  does  not  directly  address  the  problem  of  producing  or  using  an  ideal 
optical  device,  which  is  fast,  small,  can  be  highly  integrated,  and  uses  little  power.  These 
architectures  would  smoothly  scale  up  in  speed  with  the  availability  of  such  a  device,  but 
our  work  has  no  device  development  component  as  such.  The  size,  speed  and  cost  of  the 
LiNbO  3  waveguide  switches  is  such  that  the  specific  implementation  of  the  prototype 
architecture  discussed  here  would  not  be  competitive  as  a  general  purpose  computer, 
although  it  could  have  special  purpose  application  as  a  very  high  speed  controller  in  sys¬ 
tems  where  data  is  optical  to  begin  with. 

Optical  computing  helps  in  understanding  the  architectural  problems  associated 
with  very  high  speed  digital  computing.  Electromagnetic  radiation  and  induction  effects 
are  avoided,  and  experimental  demonstrations  of  both  communications  and  switching  at 
terabit  per  second  bandwidths  exist.  Current  digital  architectures  are  heavily  influenced 
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by  the  assumptions  of  arbitrary  fanout  and  instantaneous  signal  propagation  within 
moderately  complex  subsystems.  As  switching  speeds  become  faster  and  power  more  of 
a  concern,  both  assumptions  prevent  architectures  from  scaling  up  in  speed.  This  work 
involves  latch-free  designs  in  which  finite  signal  propagation  time  is  fundamental.  Such 
speed  scalable  designs  can  take  advantage  of  higher  speed  devices  as  they  become  avail¬ 
able.  Tools  such  as  the  delay  distribution  algorithms  are  essential  to  this  style  of  design. 
Optics  provides  an  excellent  environment  in  which  to  study  speed  of  light  limited  archi¬ 
tectures,  which  are  becoming  of  increasing  concern  in  electronic  computer  design  also. 

The  systems  described  here  are  not  general  purpose  supercomputers.  The  results 
show  that  designing  an  optical  computer  involves  much  more  than  simply  inserting  an 
"optical  transistor"  into  an  existing  design.  The  maturity  and  commercial  development 
of  digital  electronics  suggests  that  an  all-optical  computer  is  not  imminent.  Optics  will 
probably  find  its  way  gradually  into  digital  computers,  starting  from  the  fibers  already 
used  to  connect  cabinets  in  large,  high  speed  systems.  Although  optical  architectures 
may  well  be  different  from  electronic  ones  in  important  respects,  they  will  probably  build 
on  the  digital  design  knowledge  base  on  which  electronic  computers  rest  Optical  com¬ 
puters  will  eventually  combine  spatial  parallelism  with  high  speed  design  constrained  by 
the  speed  of  light  limit,  as  will  future  electronic  computers.  In  the  meantime,  a  better 
understanding  of  speed  of  light  limited  digital  systems  shows  great  promise  for  immedi¬ 
ate  applications.  Communications  systems  can  benefit  from  even  limited  optical  process¬ 
ing.  Time  critical  tasks  in  signal  processing  are  another  area  in  which  significant  applica¬ 
tions  may  exist.  Perhaps  even  more  important  is  the  fact  that  the  speed  of  light  limit  is  a 
universal  phenomenon,  not  just  an  optical  one.  By  studying  the  time-space  tradeoffs  in 
the  optical  domain,  insight  may  be  gained  into  the  fundamental  nature  of  physical  reali¬ 
zations  of  the  mathematical  model  which  constitutes  computation. 
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